Generalized Directed Loop Method for Quantum Monte Carlo Simulations 
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Efficient quantum Monte Cario update schemes caiied directed loops have recently been proposed, 
which improve the efficiency of simulations of quantum lattice models. We propose to generalize the 
detailed balance equations at the local level during the loop construction by accounting for the matrix 
elements of the operators associated with open world-line segments. Using linear programming 
techniques to solve the generalized equations, we look for optimal construction schemes for directed 
loops. This also allows for an extension of the directed loop scheme to general lattice models, such as 
high-spin or bosonic models. The resulting algorithms are bounce-free in larger regions of parameter 
space than the original directed loop algorithm. The generalized directed loop method is applied 
to the magnetization process of spin chains in order to compare its efficiency to that of previous 
directed loop schemes. In contrast to general expectations, we find that minimizing bounces alone 
does not always lead to more efficient algorithms in terms of autocorrelations of physical observables, 
because of the non- uniqueness of the bounce-free solutions. We therefore propose different general 
strategies to further minimize autocorrelations, which can be used as supplementary requirements 
in any directed loop scheme. We show by calculating autocorrelation times for different observables 
that such strategies indeed lead to improved efficiency; however we find that the optimal strategy 
depends not only on the model parameters but also on the observable of interest. 

PACS numbers: 02.70.Ss, 02.70.Tt, 75.10Jm 
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I. INTRODUCTION 

Monte Carlo (MC) simulations are powerful numer- 
ical tools for high-precision studies of many-body sys- 
tems, both in the classical and quantum regime. Espe- 
cially near second-order phase transitions, where physi- 
cal length scales diverge, it is essential to simulate large 
systems, which has become possible due to significant 
algorithmic advances within the last 15 years. 

In classical simulations, conventional MC algorithms 
sample the canonical partition function by making local 
configurational updates. While being straightforward, 
this approach turns out to slow down simulations near 
phase transitions and gives rise to long autocorrelation 
times in the measurement of the relevant physical observ- 
ables. For classical spin-like systems, this critical slowing 
down can be overcome using cluster algorithms , which 
update large clusters of spins in a single MC step. 

The generalization of these non-local update schemes 
to the case of quantum Monte Carlo (QMC) simulations 
was initiated by the development of the loop algorithm 
in the world- line representation 0, ■ This very efficient 
method has been used in many studies, where it allowed 
the simulation of large systems at very low temperatures. 
In the original formulation (either in discrete j3] or con- 
tinuous imaginary time), the loop algorithm however 
has a major drawback: to work efficiently, its application 
is restricted to specific parameter regimes. In the case of 
quantum spin models for example, it suffers from severe 
slowing down upon turning on a magnetic field 

This problem can be circumvented by performing clus- 
ter constructions in an extended configuration space, 
which includes world line configurations with two open 
world line fragments |(| , representing physical operators 



inserted into a MC configuration. The resulting worm 
algorithm proceeds by first creating a pair of open 
world line fragments (a worm). One of these fragments 
is then moved through space-time, using local Metropo- 
lis or heat bath updates until the two ends of the worm 
meet again. This algorithm thus consists of only local up- 
dates of worm ends in the extended configuration space, 
but can perform non-local changes in the MC configu- 
rations. The cluster generation process of the worm al- 
gorithm allows for self intersections and backtracking of 
the worm, which might undo previous changes. While 
not being as efficient as the loop algorithm in cases with 
spin-inversion or particle-hole symmetry, the great ad- 
vantage of the worm algorithm is that it remains efficient 
in an extended parameter regime, e.g. in the presence of 
a magnetic field for spin models [5| . 

An alternative QMC approach, which is not based 
upon the world-line representation, is the stochastic 
series expansion (SSE) 0, a generalization of Hand- 
scomb's algorithm || for the Heisenberg model. While 
in the original implementation Q local MC updates were 
used, Sandvik later developed a cluster update, called 
the operator-loop update for the SSE representation [ij, 
which allows for non-local changes of MC configurations. 
Within this SSE approach one can efficiently simulate 
models for which the world line loop algorithm suffers 
from a slowing down. Furthermore, loop algorithms are 
recovered in models with spin-inversion or particle hole 
symmetry Q. In fact the two approaches, world- line and 
SSE QMC are closely related [lO, EH 

Recently, it has been realized that the rules used to 
construct the operator-loops in the original implemen- 
tation 0, were just one possible choice and that one 
can consider generalized rules, which give rise to more 
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efficient algorithms 0, ^| . This new approach led to 
the construction of the "directed loop" update scheme 
by Syljuasen and Sandvik, first for spin-1/2 systems [l2j . 
Later it was adapted to general spin-S* models by Harada 
and Kawashima through a coarse-grained picture of the 
loop algorithm [T3 |. Using a Holstein-Primakov trans- 
formation of the large spin-S" algorithm, a coarse-grained 
loop algorithm for softcore bosonic models was also devel- 
oped [l5j . The improvements achieved using the directed 
loop ap pro ach have been demonstrated in various recent 
studies [T3, El, Eal ■ For a recent review of non-local 
updates in QMC, see Ref. [20| . 

In this work, we show that the equations which deter- 
mine the directed loop construction allow for additional 
weight factors, which were not considered by Syljuasen 
and Sandvik or used in 0, 0|. We explain how 
these weight factors naturally arise from a formulation 
of the directed loop construction within the extended 
configuration space. Instead of viewing the worm ends 
as link discontinuities we consider them to repre- 
sent physical operators with in general non-unity ma- 
trix elements 0, 0|. Taking these natural weight fac- 
tors into account, and numerically optimizing solutions 
to the generalized directed loop equations, we are able 
to construct algorithms which display larger regions in 
parameter space where the worm propagation is bounce 
free, i.e. a worm never backtracks. This numerical ap- 
proach allows for the implementation of directed loop al- 
gorithms in a generic QMC simulation code, which is not 
restricted to specific models. In particular, it provides 
directed loop algorithms for spin-S" and softcore boson 
systems directly using the numerical solution, without 
coarse graining, or using the split-spin representation and 
Holstein-Primakov transformation [lj, [l5j . 

Performing simulations and calculating autocorrela- 
tion times of different observables, we find that mini- 
mizing bounces does not necessarily imply more efficient 
algorithms. In certain cases, the generalized directed 
loop algorithm presented in this paper has superior per- 
formances to the standard directed loop scheme, but in 
other cases it does not. We identify the non-uniqueness 
of bounce minimized solutions as the source of this ob- 
servation: in the general case there are many solutions to 
the directed loop equations with minimal bounces, which 
however do not lead to the same performance of the al- 
gorithm. 

To further improve the algorithm, we thus propose var- 
ious additional strategies, which select out certain so- 
lutions in the subset of those which minimize bounces. 
These additional strategies can also be used in the stan- 
dard directed loop approach. Calculating again autocor- 
relation times with these different strategies, we indeed 
find that the efficiency can be further improved largely. 
However, we find that the specific strategy that gives rise 
to the best performances depends on the specific Hamil- 
tonian and on the observable of interest. The conclusion 
we reach from these results is that in most cases short 
simulations on small systems are needed in order to iden- 



tify the optimal strategy before performing production 
runs. 

Throughout this work, we use the SSE QMC scheme 0, 
in order to present our framework, since it appears to 
be the more natural approach to many problems. How- 
ever, the ideas presented here can as well be implemented 
within the path integral approach [l2T |. 

The outline of the paper is as follows: We review in 
section[H]thc SSE method and the operator-loop update 
scheme. Then we introduce the generalized directed loop 
equations in in section ITTT1 In section llVl we show how to 
numerically solve these equations in order to obtain di- 
rected loop schemes with minimized bounces using linear 
programming techniques. In section [3 we present algo- 
rithmic phase diagrams obtained within the framework 
proposed here, and compare them with those obtained 
using previous directed loop schemes. We discuss in sec- 
tion IVII results on autocorrelation times obtained from 
the simulation of the magnetization process of various 
quantum spin chains. Our results indicate, that mini- 
mizing bounces alone does not necessary lead to reduced 
autocorrelations of physical observables. We therefore 
introduce in Sec. IVIII supplementary strategies in order 
to improve the performance of directed loop algorithms, 
and present autocorrelation times obtained using these 
additional strategies. We finally conclude in Sec. IVIlil 

II. STOCHASTIC SERIES EXPANSION 
A. Presentation of the method and notations 

The SSE QMC method was first introduced by Sandvik 
and Kurkijarvi Q. In this original implementation local 
MC updates schemes were employed. Later Sandvik de- 
veloped the operator- loop update Q , which has recently 
been improved by employing the idea of directed loops 
|l2j . Before discussing our scheme, which steams from 
an extension of these ideas, we review in this section the 
formulation of the SSE method, and the extended config- 
uration space interpretation of the operator-loop update. 

To develop the SSE QMC scheme, we start from a high 
temperature series expansion of the partition function 

00 on 

Z = Tre-P* = E ^-H(-#)>> (1) 

n— ol 

where H denotes the Hamiltonian and {\ct}} a Hilbcrt 
space basis of the system under consideration. The SSE 
approach aims to develop an importance sampling frame- 
work for the terms contributing to the partition function 
for a given temperature T = 1/0. 

The resulting Monte Carlo scheme can be applied to a 
large variety of Hamiltonians, including multiple-particle 
exchange and long-ranged interaction terms. Here we 
restrict ourselves to models with on-site and short-ranged 
two-site interactions in order to simplify the following 
discussion. 
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The Hamiltonian can then be decomposed into a sum 
of bond Hamiltonians 



M 



(2) 



6=1 



where each term H b is associated with one of the M 
bonds of the lattice, 6 = (i(b), j(b)), connecting lattice 
sites i(b) and j(b). 

We assume that all contributions to H involving on-site 
terms have been expressed as additional two-site terms 
within the H b . For example, a chemical potential term 
for two sites, (Mii and ixrij, can be added to Huj\ as 
H^Kirii + KjUj), with suitable constants Ki,Kj, assuring 
the sum over all such terms recovers the initial sum. 

Inserting this decomposition of the Hamiltonian into 
the partition function we obtain 

oo an A 

2 = EE E Snw^Kp- 1 ))' (3) 

»=0 {C„} o(0),...,o(A) ' P=l 

where {C n } denotes the set of all concatenations of n 
bond Hamiltonians H b , each called an operator string. 
We have furthermore inserted sets \a(p)) of Hilbert space 
basis vectors between each pair of consecutive bond 
Hamiltonians. Therefore \u(p)) is the state that results 
after applying the first p bond Hamiltonians in the oper- 
ator string to the initial state |a(0)): 



«(p))=n^i«(o)>. 



(4) 



Furthermore |a(A)) = |a(0)), reflecting the periodicity in 
the propagation direction. In the following we also denote 
by \cti{p)) the local state at site i given the state vector 
\a(p)), so that \a(p)) = |o<i(p)} <g> \a 2 (p)) ® ... <8> |cw,(p)), 
where N s denotes the number of lattice sites. 

For a finite system and at finite temperature the rele- 
vant exponents of this power series are centered around 



(n) oc N s [3. 



(5) 



Hence we can truncate the infinite sum over n at a finite 
cut-off length A oc N s /3 without introducing any system- 
atic error for practical computations. The best value for 
A can be determined and adjusted during the equilibra- 
tion part of the simulation, e.g. by setting A > (4/3)n 
after each update step. 

In order to retain a constant length of the operator 
strings in the truncated expansion of Eq. Q we insert 
(A — n) unit operators Id into every operator string of 
length n < A, and define Hq = Id. Taking the number 
of such possible insertions into account, we obtain 



EE E 

n=0 {C A } o(0),...,o(A) 



(3 n (k-n)\ 



A! 



H(a(p)\H bp \a(p-l)), 

P =i 

(6) 



where n now denotes the number of non-unity operators 
in the operator string C\- Each such operator string 
is thus given by an index sequence Ca = (61,62, ■•■,6a), 
where on each propagation level p = 1 , . . . , A either b p = 
for an unit operator, or 1 < b p < M for a bond Hamilto- 
nian. 

Instead of evaluating all possible terms in the expan- 
sion of Eq. JO}, in a SSE QMC simulation one attempts 
to importance sample over all contributions to Eq. © 
according to their relative weight. In order to interpret 
these weights as probabilities, all the matrix elements of 
each bond Hamiltonian H b should be positive or zero. 
Concerning the diagonal part of the Hamiltonian, one 
can assure this by adding a suitable constant C to each 
bond Hamiltonian. The constant C can be decomposed 
as C = Co + e, where Co is the minimal value for which 
all diagonal matrix elements are positive, and an addi- 
tional offset e > 0. The effects of a finite value for e on 
the efficiency of the SSE algorithm will be discussed in 
section IVI 

For the non-diagonal part of the Hamiltonian an 
equally simple remedy does not exist. However, if only 
operator strings with an even number of negative matrix 
elements have a finite contribution to Eq. ©, the rela- 
tive weights are again well suited to define a probability 
distribution. One can show that this is in general the 
case for bosonic models, ferromagnetic spin models, and 
antifcrromagnetic spin models on bipartite lattices. 

Given the positivity of the relative weights, one then 
has to construct efficient update schemes, that generate 
new configurations from a given one. Within SSE sim- 
ulations that employ operator-loop updates, each Monte 
Carlo step consists of two parts. In the first step attempts 
are made to change the expansion order n by inserting 
and removing the number of unit operators. During this 
update step, all propagation levels p = 1, A are tra- 
versed in ascending order. If the current operator is an 
unit operator Hq it is replaced by a bond Hamiltonian 
with a certain probability which guarantees detailed bal- 
ance. The reverse process, i.e. substitution of a bond 
Hamiltonian by a unit operator is only attempted if the 
action of the current bond Hamiltonian does not change 
the propagated state, i.e., if \a{p)) = \a(p — 1)), since 
otherwise the resulting contribution to Eq. © would 
vanish. 

The acceptance probabilities for both substitutions, as 
determined from detailed balance, are 



P(H ^H b ) =min 
P(H b ^H ) =min 



1, 



MP(a{p)\H b \a(p-l)) 
A - n 

(A - n + l)<5|a(p)),|q(p-i)) 
Ml3(a(p)\H b \a(p-l)) 



The second part of a MC update step consists of per- 
forming a certain fixed number of operator-loop updates, 
modifying the configuration obtained from the preced- 
ing diagonal update. Keeping the expansion order n un- 
changed, attempts are made to change the intermedi- 
ate state vectors \a(p)). Most importantly, the employed 
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M 3 )> = K(yto> 



AAA/VW 



M 4 )> = \ a j(b p )(j>)) 



K2)>= |« i(6p )(p-l)> 



Site i(6m) 



Site i(fc p ) 



FIG. 1: The vertex state S p is equal to the direct product 
of the local states on its four legs: S p = cr(l)) g> |c(2)) ® 

k(3)> ® k(4)>. 
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FIG. 2: The two possible ways of inserting a pair of operators 
on a state s: (a) insertion of a pair AqA , (b) insertion of a 
pair AlA . 



cluster updates significantly reduce autocorrelations be- 
tween successive MC configurations. 

The operator-loop update makes use of a representa- 
tion for the operator string Ca as a quadruply-linked 
list of vertices, each vertex being associated with a non- 
unity operator in the operator string. To construct this 
representation, consider a propagation level p with a cor- 
responding bond Hamiltonian H b . Since the bond b p 
connects two-lattice sites i(b p ) and j{b p ), we can repre- 
sent it by a four-leg vertex, where the legs carry the local 
states on sites i(b p ) and j(b p ), given by 1 01,(1, ) (p— 1)), 
respectively \a j{bp) (p-l)) before, and by \a^ bp )(p)), re- 
spectively \ctj(b p )(j>)) after the action of the bond Hamil- 
tonian Hb , see Fig. ^ We denote the direct product of 
the four states on the legs of a vertex by 

S=K1))®K2))®K3))®K4)) (7) 

so that on the propagation level p the vertex state is 

S P = l"i(6 p )(p-l))«)|O3(6 J ,)0-l)>®|ai(6 p )(p))®|Oj(6 p )(p)}- 

In general, given the state |E) of a vertex with an asso- 
ciated bond b, we define the weight of this vertex by 

W(b, S) - (((7(3)1 ® (a(A)\)H b (\a(l)) ® cr(2)», (8) 

where if;, is the restriction of the bond Hamiltonian Hb, 
acting on the states at sites i(b) and j(b). With this 
definition, the vertex weight for a vertex at propagation 
level p equals its contribution to the matrix element in 
Eq. ©, 

W(b p ,V p ) = (a(p)\H bp \a(p-l)). (9) 

For each leg I = 1, 4 of the vertex at propagation 
level p there is a leg V of another vertex at some prop- 
agation level p' , for which there is no other vertex in 
between the propagation levels p and p' acting on the 
corresponding site of the lattice. In particular, for a leg 
I = 1,2 (3,4), we find the corresponding leg, by moving 
along the decreasing (increasing) propagation levels, un- 
til we find the first vertex and leg corresponding to the 



same lattice site. Doing so, the periodic boundary of the 
propagating state must be taken into account, so that 
upon moving beyond p' = A we return at p' = 0. Each 
leg then has an outgoing and incoming link, such obtain- 
ing a bidirectional linked list for the vertices. In fact, 
this vertex list contains the complete information about 
the operator string. The operator-loop update performs 
changes in this vertex list along closed loops, resulting 
in a new operator string and basis state, i.e. a new MC 
configuration. 



B. Construction of operator loops 

Each operator-loop results from the stepwise construc- 
tion of a closed path through the vertex list, which rep- 
resents changes on the leg states and the bond-operator 
content of the visited vertices. For the remainder of this 
work we call the path generated in the vertex list a worm, 
which upon closure constitutes the operator-loop. 

During construction the worm is extended at one end, 
called the head, whereas the other end (called tail) re- 
mains static [2l[ . The body of the worm represents part 
of the new configuration. The goal of the following dis- 
cussion is to find rules for the motion of the worm head, 
which lead to efficient updates of the operator string. In 
analogy with the worm algorithm |(| we think of each 
intermediate worm configuration as being defined in an 
extended configuration space, which includes operator 
strings that in addition to bond-operators contain source 
terms for the worm ends. For example, in a bosonic 
model these would be the operators a, or a J , which decre- 
ment or increment the local occupation number. For spin 
models, these operators would be and S~ . In fact, 
this interpretation suggests to associate weight factors to 
both the creation (insertion) and closure (removal) of the 
worm, as well as the motion of the worm head, depending 
on the action of the corresponding operators (a, and aj 
in the bosonic example). 
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T (s) = Ai(s) 
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FIG. 3: The two possibilities of moving the worm after inser- 
tion of an initial pair A^Ao- (a) The operator Aq is moved 
upwards in propagation direction. As a result, the transfor- 
mation induced on the state s is To = A* (the new state is 
Tb(s) = A' (s)). (b) The operator A^ is moved downwards 
in negative propagation direction. The transformation in- 
duced on the state s is again To = Ag (the new state being 
To(s) = y4(,(s)). Dashed horizontal lines indicate where the 
operators were before they were moved. 



Within this view the creation of a worm corresponds 
to the insertion of two operators, which we denote A® 
and Aq (Aq being the Hermitian conjugate of A ). One 
operator stands for the worm head, the other the tail. 
We choose to insert these two operators randomly, either 
as AqAq (Fig.EK) or as A Ao (Fig.03) at a random point 
in the operator list, between two (non-identity) vertices 
with a certain probability, which will be specified below. 
Furthermore, the state of the vertex legs at the insertion 
point is denoted s\. 

Then we randomly choose to move one of the two op- 
erators, which thus becomes the head of the worm. The 
other operator remains at the insertion position, consti- 
tuting the worm's tail. The worm head is associated with 
a transformation Tq acting on the state si along the di- 
rection of propagation, so that si is changed to Tq(si), 
where T(s) denotes the normalized state [z| 



T(s) 



T(s) 
\T(s)\ 



(10) 



For example, consider the case where we insert a pair 
AqAo (Fig. |2t>) ■ If we choose to move the operator Aq 
along the positive direction of propagation, this corre- 
sponds to the case where a transformation To = Aq oper- 
ates on the state si in the positive propagation direction 
(Fig. |3Ji). If we choose to instead propagate the oper- 
ator Aq in the negative direction, this corresponds to a 
transformation T = Aq on the state s±, but now in the 
negative direction of propagation (Fig. Et 1 ) ■ More gen- 
erally speaking, the transformation T performed on the 



state depends on the propagated operator A, and on its 
direction of propagation in the following way: 



T = 



A* , for positive direction of propagation, 
A, for negative direction of propagation. 



(11) 

A proposed insertion of the worm (the pair of opera- 
tors) is accepted with a probability Pmsert(To, si), which 
depends on the effective transformation To and the state 
s\. This insertion probability is determined by the re- 
quirements of detailed balance, and will be discussed in 

section IrnTTTI 

Once these initial decisions are made, the worm head 
is propagated to the next (non-identity) vertex V\ in the 
operator string along the current direction of propaga- 
tion. The worm enters vertex V\ on the entrance leg 
l\ G [1,2,3,4], which is currently in the state s\ before 
the passage of the worm, and will be modified to Tq(s\) 
by the action of the worm head. At the vertex V\ the 
worm chooses an exit leg l[ according to certain proba- 
bilities as discussed in the next section. Depending on the 
particular exit leg l[ (i) the direction of the worm's prop- 
agation may change, (ii) the operator corresponding to 
the worm head may be hermitian conjug ated (A — >• A T ) 
or stay the same (A — > A), and as a consequence of (i), 
(ii), and Eq. i|ll|) the type of transformation performed 
by the worm head may be inverted (T — > T T ) or remain 
unchanged (T — * T). 

We denote the new operator that is carried by the 
worm head after passage of V\ by Ai, and the corre- 
sponding transformation T\. The state of the exit leg l[ 
is denoted s[ before the passage of the worm, and Ti(s[) 
after the worm action. 

For general models the modifications (i) and (ii) can 
occur independently. For a model with conservation 
laws (e.g. of the number of particles for bosonic mod- 
els or of magnetization for spin models), these lead to 
the following restriction: if the direction of propaga- 
tion stays constant, the operator remains the same, so 
that Aq — ► Ai = Aq. Otherwise it is inverted, i.e. 

Aq -> A 1 = Aj. 

After the worm head leaves the first vertex V\ from 
the exit leg l[ it continues on to the second vertex V 2 , 
entering on leg l 2 - This inter- vertex propagation of the 
worm head proceeds along the connections within the 
quadruply-linkcd vertex list. 

The state on leg l 2 before the worm passes through is 
s 2 = s[, and will be transformed to Ti(s 2 ) upon passage. 
The worm head leaves V 2 from exit leg 1' 2 , where the leg 
state changes from s' 2 to T2(s' 2 ), T 2 being the transforma- 
tion which corresponds to the new operator A2 associated 
with the worm head after it passed V 2 . 

This process continues until the worm exits a vertex 
Viv from a leg l' N , and from there returns to the inser- 
tion point. There are two possibilities for the worm head 
to approach the insertion point: either the final trans- 



T f . In the 

first case we call the resulting operator-loop a "normal" 



formation Tn is the same as Tq, or T, 
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Bounce loop 

(a) 



Normal loop 
(b) 



FIG. 4: The two possible closings of a worm. The initial pair 
inserted was AqAo (Fig. |5Jj), and the first move the propaga- 
tion of Ao upward in propagation direction (Fig. , so that 
the initial transformation is To = Aq. (a) For a bounce loop, 



the final transformation Tn = TJ. (b) For a normal loop, 
the final transformation Tat = To. The dashed horizontal line 
indicates the initial position of the operator Aq. 



loop, otherwise a "bounce" loop (Fig. 0J. The bounce 
loop corresponds to the case in which the order of the 
operators after the return of the head to the tail has the 
same orientation as directly after insertion (Fig.^Jt). For 
a normal loop the relative order of the operators is in- 
verted (Fig. Et>). 

In the method presented there, the worm always stops 
when it has reached its starting point. We note that 
there are other schemes 113 where the worm does not 
necessary do so, but continues with a certain probability. 
In section ITlI C 31 we discuss the efficiency of our choice. 



III. DIRECTED LOOPS 
A. Generalized directed loops equations 

For the actual construction of the operator-loop we 
need to specify the probabilities for choosing exit legs at 
each visited vertex. In this section, we focus on how to 
derive generalized equations for these probabilities. 

Consider a worm entering a vertex Vj, which is entirely 
specified by the values of the states at its four legs, as 
well as the lattice-bond bi corresponding to this vertex 
(here bi denotes the bond type of the ith vertex). The 
state of the vertex before the entrance of the worm is 
Sj = ^(1)) ® |a(2)) ® |<7(3)> ® H4)) (Fig.©. 

The worm enters Vj from the entrance leg U, and 
exits from leg l\. The states at these legs before the 
worm passes are denoted Sj and Sj, respectively, i.e. 
s% = \ui(k)), and = |cr,*(^)). Both states are changed 



by the worm's passage, and become Tj_i(sj) and Ti(s-), 
respectively. Correspondingly, the total state of this 
vertex becomes Sj = \a(l)) <8> |ct(2)) ® |ct(3)) ® |ct(4)), 
where |5j(Z)) = \u l {l)) except for |oj(Zj)) = f l - 1 (s i ), and 
W l ))=T l {s>). 

We define P 6i (Sj,Tj_i -> T h U -> 1$ to be the condi- 
tional probability of exiting on leg l[ , given that the worm 
head enters on leg U . This "scattering" probability can in 
general depend on the bond type bi, the transformation 
of the worm head before (Tj_i) and after (Tj) passing Vi, 
the state Sj, and on the actual path of the worm through 
this vertex, i.e. the legs U and l[. For a model with con- 
servations laws, Ti is implicitly given by Tj_i and U and 
l[, as discussed in the previous section. For clarity we 
illustrate our notations in Figs. [3] and [III 

What are the possible values for the scattering prob- 
abilities so that the resulting operator-loop construction 
fulfills detailed balance? In the original operator-loop im- 
plementation , Sandvik showed that a generic solution 
for any model is to set P^ (Sj, Tj_i — > Tj, li — * Z-) propor- 
tional to W(bi, Sj), i.e. the weight of the vertex after the 
passage of the worm (this solution is often referred to as 
the heat-bath solution). However, this choice turns out 
to be inefficient in many cases, because of "bounce" pro- 
cesses (i.e. the worm head exits a vertex from the same 
leg from which it entered the vertex) • An algorithm 
which minimizes the number of these bounce processes is 
often more efficient [II [II, El [II [II . In this context, 
Syljuasen and Sandvik proposed the "directed loop" up- 
date [13, with probabilities P 6i (Sj,Tj_i -> Tj, l L -> Z<) 
chosen as to minimize or even eliminate bounces. These 
probabilities are derived analytically for spin- 1/2 mod- 
els in Ref. 01 1 an d a more general framework to obtain 
them is given in Ref. [T^ j. 

The optimization (with respect to the bounce mini- 
mization) of the scattering probabilities has to be per- 
formed under the constraint of fulfilling detailed balance 
for the resulting operator-loop update. Syljuasen and 
Sandvik showed that in order to fulfill detailed balance 
of the directed-loop update the following condition on 
the scattering probabilities is sufficient: 



W(&j,Sj)P iH (Sj,Tj_ 1 Tj, li -+ l[) 
W(b h % )P bi (Sj , T} -+ Ti, , k ) 



(12) 



which we refer to as the local detailed balance condition, 
since it demands detailed balance during each step of the 
worm head propagation. The original algorithm by Sand- 
vik [|, where PjSj,^ T i: U -» l' t ) cx W{b u %), 
obviously fulfills this condition. With this choice, the 
probabilities do not depend on the entrance leg Z,. This 
is not true for the bounce-minimized solution, which by 
definition results in direction-dependent scattering prob- 
abilities for the directed loop update. 

The idea behind the work presented here is to consider 
the motion of the worm head in the extended configu- 
ration space. We show that this leads to a general set 
of equations for the scattering probabilities, which also 
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Si : State of the leg li 
before the worm's transit 



Ti-i : Operator carried 
by the worm's head 





s[ : State of the leg l[ 
before the worm's transit 



Vertex i 
sitting on bond type hi 

Sj : Current legs' states 
(before worm's transit) 

W(bi,Si) : Vertex weight 



Tj_i(a f ) : State of the leg U 
after the worm's transit 



FIG. 5: (Color online) Worm entering the i-th vertex during its construction. 



Tt : New operator carried 
by the worm's head 




s\ : State of the leg l[ 
before the worm's transit 



Ti_i(sj) : State of the leg k 
after the worm's transit 



Tiis't) : State of the leg l\ 
er the worm's transit 



Vertex i 
sitting on bond type bj 

X; : Current legs' states 
(after worm's transit) 

, £;) : Vertex weight 



FIG. 6: (Color online) Worm leaving the i-th vertex during its construction. 



guarantee detailed balance. These generalized equations 
have solutions that allow us to further reduce the bounce 
probabilities, and to even eliminate bounces in large re- 
gions of parameter space. 

If we consider the worm construction process in the 
extended configuration space, it appears natural to view 
the worm head as an operator acting on the local states 
of the world-line configuration, and to assign the corre- 
sponding matrix element as an additional weight factor 
to its propagation. The worm head matrix element is 
(f(s)\T\s). Let us denote by /(T, s) = (f(s)\T\s) the ad- 
ditional worm weight factor that will be used in the gen- 
eralized equations. Here T denotes the transformation 
corresponding to the worm head, and s the local state of 
the world- line configuration, where the worm head acts. 
Even though f(T, s) is always equal to the worm head 
matrix element in the scheme presented in this paper, we 
use this notation such that one can recover the standard 



directed loop framework by putting f(T, s) equal to 1 in 
all equations given below. 

With this definition of f(T, s), the following hermitic- 
ity condition is then fulfilled for all T and s : 

f{T\f{s)) = f{T,s). (13) 

We also denote the weight of the worm head before it 
enters the vertex Vi by /(Tj_x,Sj), depending on both 
the transformation and the state Sj. In the extended 
configuration space, the local detailed balance equation 
then reads: 

f(Ti-i, Si)W(bi, Vi)P bi (Si, Tj_! -» Ti, k -» li) = 

f{T}Ms[)W{b u %)P bi {%,T} - T}_ X X -> k) (14) 

which constitutes our generalized directed loop equation. 

Note that we recover the previous scheme of Syljuasen 
and Sandvik upon setting /(T, s) = 1 for all T and s 
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in this equation and in those given below. In the fol- 
lowing section, we prove that Eq. 114fl indeed guarantees 
detailed balance of the operator-loop update, as long as 
the worm weight /(T, s) fulfills the hcrmiticity condition, 
Eq. O. 



B. Proof of detailed balance 

The following proof of detailed balance uses the 
worm/ antiworm construction principle [TsL \2?ty . We first 
calculate the probability to create a worm w, hitting N 
vertices before coming back to the insertion point: 



init ' r insert (^Oi s l) 
N 



pw _ p p 
" 1 init 1 lr 

N 

JJn 4 (53 i) T i _i-^T i ,l i -»/0. (15) 



where P; n it denotes the uniform probability of choosing 
the insertion point in the operator string. 

Now we consider an antiworm w, traversing exactly the 
path created by w but in the reverse direction. The an- 
tiworm acts on the configuration that has been obtained 
after passage of the worm w. The antiworm thus com- 
pletely undoes the action of the worm w, leading back 
to the configuration prior to the insertion of the worm 
w. The antiworm is inserted at the same place as w, 
and its initial head operator is exactly the inverse of the 
last worm head operator, so that its insertion probabil- 
ity is Pinscrt(Tlf , Tn(s' n )). The probability to create the 
antiworm is thus 



P W — Pinit " -F'msertPjv') 7jv(Sjv)) 
N 

xJ{P bi {%,Tl -it^-fc)- (16) 



for all i < N. For a "normal" loop, we furthermore have 
T/v = To and s' N = s±, so that 

f(Tl f ,f N (s' N ))=f(T ,s 1 ), 

again using Eq. (|13Jl . In case of a "bounce" loop, where 
T/v = Tjj and s' N = Tq(si), we obtain the same relation, 

since /(T 0j T t (r (ai)) = f(T , Sl ). 

The factors of f(T, s) thus exactly cancel each other in 
the numerator and denominator of Eq. 1181 and we obtain 



pw I pw 



-Pjnscrt (Tp , S i ) 
-^insert 

1 W(bi,-Ei) 



n 



(19) 



Detailed balance is thus fulfilled, provided 

-Pinscrt (^0 , Sl) = ^insert {Tjf , T/V ( s 'n) ) ■ (20) 

For a "bounce" loop, where T^ = To and Tn(s' n ) = si, 
this condition is always fulfilled. In case of a "normal" 
loop, where Tv = T) and s' N = So, we need for Eq. I|20() 
to hold, that 



-^insert 



t(T ,S! 



(21) 



In other words, the probability to insert (at the same 
place) an antiworm that will undo exactly what a worm 
just did must be equal to the probability used to insert 
this original worm. If this condition is fulfilled for all 
transformations To and all possible states si, we obtain 
a detailed balanced operator-loop update. 



Operators, insertion probabilities, worm 
weights, and Green's functions 



The ratio of the two probabilities is 
-^insert (7b, si) 



pw i pw 



Pinsext(Tl r ,T N (s' N )) 

x TT Pb ^^ Ti - 1 zl Tiih it jjj (17) 

Up^T^TUl'^kY 



Using Eq. (|14|) . we obtain 

Pinsert(To, Si^ 



pw I pw 



7 > inscrt(T / ( r , Tn(s' n )) 



N 

n 



f(T i . 1 ,s i )W(b i ,'E i ) 



Since s[ = Sj+i we obtain, using Eq. (|13|) . 

f(Tl T i (s> i )) = f(T i ,s i+1 ) 



(18) 



Let us be more specific now, and discuss the kind of 
operators A that can be used as operator insertions, 
and which corresponding insertion probabilities fulfill Eq. 
(|2ip. We focus on two cases: quantum spin-S 1 and soft- 
core bosonic systems. 

For a quantum spin-5 1 system, the local state at a given 
site is given by the projection of the spin value at that 
site, e.g. onto the z axis. We denote this projection by 
m which can take 25 + 1 values in the range —S, —S + 
1,...,S-1,S. 

For bosonic systems, the local state is given by the 
number n of bosons at the site. If we truncate the Hilbcrt 
space by restricting the number of bosons per site to a 
maximum value iV max , n can take integer values in the 
range 0, . . . , N max . 

What are the possible operators Ao to be used in the 
operator pair insertion for these models? In many cases, 
a good choice is to construct so called ±1 worms: A +1 
(—1) worm head acting on state s changes it to s + 1 
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(s — 1). The operators Aq associated with the worm ends 
are then simply the creation (annihilation) operators cr 
(a) for bosons, and the ladder operators S + (S~) for 
spins, respectively. 

1. Insertion probabilities 

For ±1 worms, Eq. becomes Pinsert (a, n + 

1) = Pi nS ert ? n ) m the case of bosonic models and 
Pinscrt(S~ , m + 1) = P in s CI t(S + , m) for spin models. 

For a spin-5 1 model, we cannot insert a +1 (—1) worm 
onto a given initial state with m = S (m = —5). Since 
we always want to create a worm in all other cases, we 
propose the following insertion probabilities: 

Puw*(S ± ,m)= 1 ~ S ™< ±S . (22) 

If S = 1/2, we use Pinsert (S^ , m) = 5 m Tl / 2 instead, so 
always inserting a worm. 

For a softcore bosonic model limiting the maximum 
number of bosons per site to N max , we equivalcntly use: 

-Pinsert (a\«) = ^ 

-Pinsert (a, n) = - — y^-. (23) 

For hardcore bosons (N max = 1), we instead use 
Pinsert (a t ,n) = S n>0 and Pi ns ert (a, n) = S n>Nm ^, thus al- 
ways inserting a worm. 

These insertion probabilities differ from the weight as- 
signed to the operators in the extended configuration 
space, namely the matrix elements of these operators (see 
subsection IIII C 2|) . As suggested in Rcf. |f2f, it is possi- 
ble to set Pnsert(P, s) proportional to (T(s)\T\s), similar 
to the worm algorithm @. Indeed, this choice satisfies 
Eq. J23J- 

2. Worm weights 



We note, that the operators used here (corresponding 
to ±1 worms) are not unique, as we can for example also 
employ ±2 or ±3 worms. 

It is also possible that other choices of weights such 
that f(T, s) is not equal to (T(s)\T\s) might lead to more 
efficient algorithms. Indeed, in the proof of detailed bal- 
ance, the only requirement on f(T,s) is Eq. I|13|) . How- 
ever, the above choices naturally appear within the ex- 
tended configuration space, and lead to algorithms with 
less bounces, as will be shown below. 

3. Stopping probability 

We propose to always close a worm when the worm 
head returns to the insertion point. It is possible, as 
noted in references 0,0], to not necessary do so, but 
to offer the worm the possibility to continue depending 
on the value of the final state. As a consequence, the 
worm insertion probabilities need to be changed accord- 
ingly, in order to retain detailed balance. It is not a 
priori clear which approach results in a more efficient al- 
gorithm. Only precise studies of autocorrelation times 
could answer this question for each specific model and 
set of parameters, which is however well beyond the scope 
of this work. Instead we present an intuitive argument, 
why we expect closing worms immediately to be more 
efficient: 

The goal of using worm updates is the generation of 
large non-local changes in each MC configuration, in or- 
der to decorrelate two consecutive measurements. A pre- 
cise quantification of this decorrelation effect in terms of 
CPU time must take into account the worm size. Mak- 
ing a long worm and thus obtaining large decorrelation 
effects should grossly be equivalent to making two short 
worms with only half the decorrelation. However, if after 
the first encounter of the initial point the worm has al- 
ready resulted in large enough decorrelation, it becomes 
less meaningful to continue this worm, as we can already 
perform an independent measurement instead of spend- 
ing more CPU time for the construction of a longer worm. 



In the extended configuration space, where the worm 
head is associated with an operator acting on the local 
state in the world-line configuration, the worm weights 
are equal to the matrix elements, f(T, s) = (T(s)\T\s). 

To be more specific, consider employing ±1 worms for 
a spin model. Then T can be S + or S~ , so we obtain 

f(T,s) = f(S ± ,m) = (milium) 

= ^S(S + 1) -m(m±l).(24) 

For a bosonic model with ±1 worms, T is either a or 
a' and thus 

f(T s) = / ^ flt '") = (n±lW\n) = v/F + I) 
A ,S> \ f(a,n) = (n±l|o|n) = ^ 

(25) 



4- Measuring Green's functions 

With the above choices, the measurement of Green's 
functions during the worm construction needs to be 
slightly modified in order to account for the presence 
of the explicit worm weights in the worm's propagation. 
For a detailed account on how the Green's functions mea- 
surements are performed using heat bath and standard 
directed loops with the insertion and stopping probabil- 
ities of Sec. IIII C f I and IIII C 31 respectively, we refer to 
Ref. p3 |. Here, we only summarize the main point: In 
the standard directed loop algorithm, the value of the 
Green's function measurement for a given distance (in 
space and imaginary time) between the worm head and 
the worm tail equals the product of the matrix clement 
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of the operator inserted at the head of the worm (which 
would be in our notations (T(s)\T\s), where T denotes 
again the transformation corresponding to the worm 
head, and s the local state of the world-line configuration, 
onto which the worm head acts) times the matrix ele- 
ment inserted at its tail (in our notations (To(so)|Tol s o)j 
where Tq denotes the transformation corresponding to 
the (static) worm tail, and sq the local state of the 
world-line configuration, onto which the worm tail acts). 
For a detailed graphical illustration of this measurement 
process we refer to Ref. j24|- The only modification to 
this scheme, which arises from using generalized directed 
loops is as follows: In the generalized directed loop algo- 
rithm, the propagation of the worm head fulfills detailed 
balance in the extended configuration space. The worm 
head's matrix elements arc thus taken into account in 
the probability to obtain a given configuration (in space 
and imaginary time) between the worm head and the 
worm tail. Therefore, in the generalized directed loop al- 
gorithm the value of the Green's function measurement 

equals (Tb(s )|To|so) ■ (fo( S o)\T \ So ) = (T (s )\T \so} 2 . 
Note, that this is independent of T and s, and involves 

only the value of the matrix element (Tq(sq)\To\sq) 2 from 
the static worm tail. The Green's function measurement 
in the generalized directed loop algorithm thus requires 
significantly less evaluations of matrix elements, or ac- 
cesses to their look-up table. 

If one would furthermore choose Pi nser t(T, s) propor- 
tional to (To(so)|To|so), similar to the worm algorithm 
discussed in Sec. IIII C II the value of each Green's func- 
tion measurement would be equal to 1, as for the worm 
algorithm In fact, this way the matrix elements of 
both the worm head and tail would be accounted for 
explicitly during the construction of the worm and its 
propagation. 

IV. NUMERICAL STRATEGY 

In the preceding sections we derived generalized con- 
ditions on the scattering probabilities Pb i (Sj,Ti_i — > 
Ti, li — ► which describe the motion of the worm head 
at each vertex during the worm construction. We now 
look for solutions of Eq. I|14|) , for which the bounce prob- 
ability for each possible vertex configuration is as small 
as possible. We expect this to lead to an optimal algo- 
rithm in terms of autocorrelation times. Here, we explain 
how to numerically solve Eq. I|14|l for such probabilities. 
Note, that the numerical procedure outline below also ap- 
plies to the standard directed loop approach, by simply 
setting f(T, s) equal to 1. 

For a given vertex configuration we can construct from 
the scattering probabilities P^X^T,-! — > T\ih — ► Vj) a 
4x4 "scattering matrix" P, whose elements arc 

P kl = Pb^uT^ ^ T h l ^ k), 

so that the element Pki corresponds to the probability of 



exiting from leg k, given that the worm head entered the 
vertex on leg I. 

There are various constraints on the matrix P. In par- 
ticular, Eq. i|14|) constraints the elements of P according 
to detailed balance. Furthermore, in order to be inter- 
preted as probabilities, all the matrix elements of P must 
be contained within [0, 1], that is to say 

0< P k i < 1 Vfc.Z- (26) 
Since the worm always leaves a vertex, we must have 

5Z P « = 1 ( 27 ) 

k 

i.e. each column of P must be normalized to 1. 

It is possible to add additional symmetry constraints 
on P. While these are not necessary conditions, they 
might increase the numerical accuracy in looking for the 
matrix P. Given an entrance leg I, let us call two legs k 
and h equivalent, k ~ h, if the product / • W on the right 
hand side of Eq. I|14|) gives the same value, independent 
of choosing k or h as the exit leg. If two equivalent legs 
k and h both differ from the entrance leg I, they must be 
chosen as the exit leg with equal probability, i.e. 

Pki = Phi Vl^k,h, k~h. (28) 

A similar condition can be derived for equivalent entrance 
legs by consideration of the reversed process. 

After characterizing the constraints on the scattering 
matrix P, we can now formulate our optimization crite- 
rion in terms of P. Our goal is to construct an optimal 
directed-loop update, and as argued before 0, 0, Q] 
we aim to minimize the number of bounce processes, 
i.e. the bounce probabilities. In our P-matrix language, 
this means that we need to minimize all diagonal ma- 
trix elements. In order not to introduce any additional 
bias among the different bounce probabilities, we require 
for the actual implementation to minimize the trace of 
the matrix P, thereby treating all bounce probabilities 
equally, 

minimize y^Pu. (29) 
i 

In previous studies E2, [H Q E H3 , sets of proba- 
bilities satisfying all these conditions were obtained ana- 
lytically for specific models. From the constraints (|14I26I 
1271280 and the optimization goal (|29f) . we see that we ar- 
rive in front of a linear programming problem for each 
scattering matrix P [Tsj . This can be be solved numer- 
ically using standard linear programming routines |25j |. 
In most cases, we found that at most one diagonal ma- 
trix element was non zero. We also note here that the 
linear programming routines picks one of the many pos- 
sibly equally optimal (with respect to condition (|29|l ) so- 
lutions depending on its initial search point. This issue 
will be further discussed later. 
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This direct way of looking for the optimal (in terms of 
bounce minimization) solutions of the directed loop equa- 
tions is not specific to any model and needs no preced- 
ing analytical calculation. It allows for a rather generic 
implementation of the SSE algorithm, where after imple- 
mentation of the Hamiltonian, a standard minimization 
routine |25j can be employed in order to obtain the scat- 
tering matrices prior to starting the actual simulation. 

V. ALGORITHMIC PHASE DIAGRAMS 

In this section, we apply the preceding method to the 
simulation of quantum systems which have been exten- 
sively studied previously using the SSE QMC method. 

To ensure that all diagonal matrix elements of the bond 
Hamiltonians are positive, we add a constant C = C + e 
per bond to the original Hamiltonian where Co is the 
minimal value for which all diagonal matrix elements arc 
positive, and e > 0. Wc will sec that usually a finite value 
of e is required in order to allow for regions in parameter 
space which are completely bounce-free. In general, we 
find that increasing e results in lower bounce probabili- 
ties. However, as the size of the operator string grows 
with e, this leads to increasing simulation times: there 
is clearly a tradeoff between more bounces but less CPU 
time (small e) and less bounces but more CPU time (large 
e). We expect that there is no general rule how to a pri- 
ori choose the value of e in order to obtain the smallest 
autocorrelation times. 



A. Heisenberg model 

First wc consider the easy-axis spin-5 Heisenberg 
model in an external magnetic field h, 

where A denotes the easy-axis anisotropy, and the 
first sum extends over all nearest neighbors on the d- 
dimensional hypcrcubic lattice. 

Numerically scanning the parameter space (A, h > 0), 
we search for regions where our optimization procedure 
finds bounce-free solutions (i.e. J^i Pu = f° r au " allowed 
vertices). From this procedure we obtain the algorithmic 
phase diagram displayed in Fig. [7J 

We find a finite region of the parameter space (|A| + 
h/(2dS) < J) which corresponds to bounce-free solu- 
tions of the generalized directcd-loop equations. Within 
this region typically one needs e > SJ/2. However, for 
A = ±J, we find bounce- free solutions also for e = 0. 
Outside the bounce-free region at least one of the scat- 
tering matrices does not allow for a traceless solution. 

For 5 = 1/2, Syljuasen and Sandvik analytically found 
the same bounce- free region |l2j . By monitoring the pa- 
rameter dependence of the finite bounce probabilities, we 




FIG. 7: (Color online) Algorithmic phase diagram for an easy- 
axis anisotropic spin-S Heisenberg model in a magnetic field h 
field, on a d-dimensional cubic lattice with nearest neighbor 
exchange J. The easy-axis anisotropy is denoted A. The 
shaded region indicates those parameters, for which a bounce- 
free solution of the generalized directed loop equations can be 
found. 

verified that our numerical approach indeed yields their 
analytical solution. 

Syljuasen recently extended the directed loop frame- 
work proposed in 0] to spin-5 1 models 0] • Within our 
framework, his ansatz corresponds to setting f(T, s) = 1. 
He finds no region in parameter space where the directed 
loop equations allow for bounce-free solutions for any 
5 > 1. We have verified this by setting f(T,s) = 1 and 
find that for 5 > 1, there is indeed no bounce free solu- 
tion, using Syljuascn's choice. For 5 — 1/2 and 5=1, 
Syljuasen recovers the phase diagram shown in Fig. |7| 
This reflects the fact that for 5 = 1/2 and 5 = 1 all non- 
zero matrix elements of the 5 ± operators are equal and 
thus the factors f(T, s) cancel out of the generalized di- 
rected loop equations, making our generalization equiva- 
lent to the standard approach. For 5 > 1 the generalized 
directed loop equations, including the worm weights as 
extra degrees of freedom, however allow for more bounce 
free solutions. 

The algorithmic phase diagram shown in Fig. [7] was 
also found to hold for the coarse-grained loop algo- 
rithm 0|. This suggests that the numerically deter- 
mined scattering probabilities arc similar to those of 
the coarse graining approach. This equivalence is also 
pointed out more clearly in Ref. pt| . 

B. Softcore Bosonic Hubbard model 

Here we present algorithmic phase diagrams for the 
bosonic Hubbard model, with Hamiltonian 

H = —t afaj + a,a] + U/2 n,(n, — 1) — fi^~] h,, 

(31) 

where the aj (a^) denote boson creation (destruction) 
operators on sites i, ni — ajaj is the local density , t is 
hopping amplitude, U the on-site interaction, and \x the 
chemical potential. 
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1 2 cr(JV -l) 

FIG. 8: (Color online) Algorithmic phase diagram for the 
bosonic Hubbard model on a d-dimensional cubic lattice with 
nearest neighbor hopping t, onsite interaction strength U, and 
a chemical potential /i. N max denotes the cutoff in the local 
occupation number. The shaded region indicates the regime 
of bounce-free solutions of the generalized directed loop equa- 
tions. 



We need to restrict the simulation to a maximum num- 
ber N max > 1 of bosons per lattice site, in order to obtain 
positive diagonal bond Hamiltonian matrix elements. For 
the hardcore bosonic case -/V max = 1, we refer to the 
preceding section, since the hardcore bosonic Hubbard 
model exactly maps onto a spin-1/2 antiferromagnetic 
Heisenberg model. 

Using our numerical optimization technique we arrive 
at the algorithmic phase diagram shown in Fig. El There 
is a finite region of bounce-free solutions to the directed 
loop equations. However, this region shrinks upon in- 
creasing N max , and we need to allow e > N max t/2 in 
order to recover the complete bounce free region. 

Syljuascn studied the directed loop equations for 
bosonic models and did not obtain bounce-free regions 
for any iV max > 1 The same result was obtained in 
Ref. La- A gain this indicates the importance of allowing 
the additional weight factors within our approach. 

Smakov et al. [l5j presented a coarse-grained loop algo- 
rithm for the simulation of softcore bosons. They present 
results for free bosons, for which no constraint on the oc- 
cupation number is necessary within the SSE approach. 
Since their method proceeds directly in the -/V max — > oo 
limit, we expect that using their algorithm there will re- 
main no bounce-free regions for finite on-site interaction. 

For a softcore bosonic model without a cutoff on the 
maximum value of bosons per site, it is also possible to 
perform simulations by imposing an initial cutoff iV max , 
which is then adjusted during the course of the ther- 
malization process. With the numerical procedure at 
hand, it is easy to recalculate the scattering matrices P 
when needed, namely when the current cutoff becomes 
too small, and needs to be increased. 



VI. AUTOCORRELATION RESULTS 

The results presented in the previous section suggest 
that the generalized directed loop equations lead to ef- 
ficient update schemes. In particular, in many cases we 
could greatly extend bounce-free regions in parameter 
space using the generalized directed loop method. 

It is generally expected that reducing bounce processes 
leads to more efficient algorithms. In this section, we 
therefore compare the efficiency of an arbitrarily picked 
solution to the generalized directed loop algorithm to 
earlier approaches: the original heat bath choice for the 
scattering probabilities by Sandvik 0, and the directed 
loop approach by Syljuasen and Sandvik 0, El This 
comparison is performed using the example of the mag- 
netization process of quantum spin chains. 

We define each MC step to consist of a full diagonal 
update, followed by a fixed number N w of worms up- 
dates, where N w is chosen such that on average twice 
the number of vertices in the operator string are hit by 
those worms. Wc perform a measurement after each such 
Monte Carlo step, and determine integrated autocorrela- 
tion times using standard methods 0|. 

In case the effort for a single MC step was the same 
for each of the three algorithms, the integrated autocor- 
relation time would establish a valid comparison between 
these algorithms in terms of CPU time. Suppose, how- 
ever, that a MC step of Alg. A took twice the CPU time 
than a MC step using Alg. B. In that case even with 
a 50% reduction of the autocorrelation time upon using 
Alg. A, both would be equally efficient, since in order 
to obtain a given number of independent configurations, 
the same CPU time would be needed. In the following, 
wc therefore present a measure of autocorrelations, which 
takes the effort of each update scheme into account in a 
machine independent way. 

For this purpose, we define the worm size w as the total 
number of vertices that have been visited by the worm, 
including those visited during bounce processes 0, • 
The number N w , calculated sclf-consistently during ther- 
malization, is then defined such that N w (w) ~ 2(n), 
where (n) is the average number of non-identity oper- 
ators in the operator string ((. . .) denotes MC averages). 
In counting N w we include worms that are immediately 
stopped. The number N w can fluctuate from one simula- 
tion to another, and more importantly depend on the un- 
derlying algorithm: indeed, the worms constructed using 
different algorithms arc not expected to be of the same 
size. In order to account for this difference in effort, wc 
multiply the integrated autocorrelation times by a factor 
N w {w)/{n), which is close to 2 by definition, but which 
might differ, depending on the underlying algorithm. 

The results presented below were obtained by the fol- 
lowing procedure: for each of the three algorithms we 
run simulations containing 10 6 MC steps and calculate 
integrated autocorrelation times tq for various obscrv- 
ables Q . From these we obtain effort-corrected autocor- 
relation times t — 7~o • N w (w) j (n) , leading to a machine- 
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FIG. 9: (Color online) Bounce probabilities using the heat 
bath algorithm, standard directed loops, and generalized di- 
rected loops for a L = 64 sites spin-3/2 XY chain in a mag- 
netic field h at (3 = 64/ J. A different scale is used for the 
heat bath algorithm. 



independent measure of efficiency. We applied the above 
procedure to the autocorrelations of the uniform mag- 
netization and energy of antiferromagnetic spin chains in 
finite magnetic fields, and present results for the spin-3/2 
XY and the spin-2 Heisenberg case. 



A. Spin-3/2 XY chain 

We simulated the spin-3/2 XY model (Eq. JHOJl with 
A = 0) on a L = 64 sites chain at an inverse temper- 
ature (3 = 64/ J, for fields from zero up to saturation, 
h = 3J. For the simulations presented here, we chose 
e = S J/2 — h/4 which is found to be the minimal value 
to have a bounce free algorithm for a XY chain in a field. 
The magnetic field dependence of the bounce probability 
is shown for all three algorithms in Fig. [5J The bounce 
probability is rather large (30 — 45% for all fields) for the 
heat bath algorithm and significantly reduced (to less 
than 2%) using the standard directed loop equations, 
while it vanishes all the way up to the saturation field 
using generalized directed loops. 

The rescaled autocorrelation times of the magnetiza- 
tion (r m ) and energy (te) are shown as functions of the 
magnetic field strength in Figs. 1101 and 1111 respectively 
Using the heat bath algorithm, r m increases upon in- 
creasing h, while te decreases. The uniform magne- 
tization of the MC configuration is updated only dur- 
ing the operator-loop updates, while the energy is not 
changed during this update step Therefore autocor- 
relations in the energy measurements are less sensitive 
to the efficiency of the operator loop update, and mainly 
decrease with field strength, due to increasing operator 
string lengths. In both the low and the high field region, 
the improvements of standard and generalized directed 
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FIG. 10: (Color online) Autocorrelation times for the uniform 
magnetization, measured using heat bath, standard directed 
loops, and generalized directed loops for a L = 64 sites spin- 
3/2 XY chain in a magnetic field h at (3 = 64/ J. 
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FIG. 11: (Color online) Autocorrelation times for the energy, 
measured using heat bath, standard directed loops, and gen- 
eralized directed loops for a L = 64 sites spin-3/2 XY chain 
in a magnetic field h at /3 = 64/ J. 



loops upon using the heat bath algorithm are clearly 
seen for both the energy and magnetization in Figs. EH 
and^] Within our scheme, we find small but not sig- 
nificant improvements over the standard directed loops, 
and for h ~ J, the bounce-free solution even results in 
slightly larger autocorrelation times than the heat bath 
method. 

This clearly indicates that one must include further 
strategies, besides the bounce minimization in order to 
obtain a better algorithm, as will be discussed in Scc. lVHI 



B. Spin-2 Heisenberg chain 

Next, we consider the isotropic (A = 1) antiferromag- 
netic spin-2 Heisenberg model in a magnetic field. We 
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FIG. 12: (Color online) Bounce probabilities using the heat 
bath algorithm, standard directed loops, and generalized di- 
rected loops for a L = 64 sites spin-2 antiferromagnetic 
Heisenberg chain in a magnetic field h at j3 = 64/ J. A differ- 
ent scale is used for the heat bath algorithm. 



FIG. 13: (Color online) Autocorrelation times for the uni- 
form magnetization, measured using the heat bath algorithm, 
standard directed loops, and generalized directed loops for a 
L = 64 sites spin-2 antiferromagnetic Heisenberg chain in a 
magnetic field h at (3 — 64/ J. The inset shows on a larger 
scale the autocorrelation times using directed loops at small 
values of the field. 



simulated a chain with L = 64 sites at /3J = 64 and for 
fields ranging from zero up to saturation at h = 4 J, and 
using e = SJ/2. In Fig. El the resulting bounce proba- 
bilities are shown as functions of magnetic field strength 
for the three algorithms. Similar to the previous case, 
the bounce probabilities are rather high using heat bath, 
~ 34 — 42%), whereas they are significantly reduced us- 
ing the directed loop algorithms (less than 1% in both 
cases). Even though the bounce probabilities are finite 
at h > for the generalized directed loop algorithm, they 
are smaller than for the standard directed loop algorithm. 
Furthermore, in the limit of zero field, using generalized 
directed loops leads to a vanishing bounce probability. 

In Fig. 1131 and Fig. we present results for rescaled 
autocorrelation times of the magnetization (r ro ) and en- 
ergy (te). The dependence of r m on the magnetic field 
has a similar tendency for all three algorithms: starting 
from a small value at zero field, tm is sharply peaked 
at h ~ 0.1J, and decreases rapidly upon further in- 
creasing the field strength, reaching an almost constant 
value. This sharp peak around h ~ 0.1J probably cor- 
responds to the closure of the Haldane gap (estimated 
as A H = 0.08917(4)J for the spin 2 chain j^|) by the 
magnetic field. We observe that tm is larger by nearly 
a factor of 3 using heat bath rather than directed loops. 
This is expected given the larger bounce probabilities in 
Fig. El We find that independent of the magnetic field 
strength, r m is less for the generalized directed loop al- 
gorithm than for the standard one. 

Concerning the autocorrelation times te shown in 
Fig. El we reach similar conclusions as for the spin-3/2 
XY case: the autocorrelation times of the energy are re- 
duced by a factor around 2 from those using the heat 
bath algorithm. 
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FIG. 14: (Color online) Autocorrelation times for the energy, 
measured using heat bath, standard directed loops, and gen- 
eralized directed loops for a L = 64 sites spin-2 antiferromag- 
netic Heisenberg chain in a magnetic field h at (3 = 64/ J. 



VII. OPTIMIZING DIRECTED LOOP 
ALGORITHMS 

The results in the previous section clearly indicate 
that minimizing bounces alone is not sufficient to obtain 
an efficient algorithm, since the bounce-free (or bounce- 
minimized) solution is not unique [T^ . ll7j . The numerical 
lineal programming solver employed picks a particular so- 
lution, which might not be the optimal one in terms of 
autocorrelations. In this section, we present supplemen- 
tary strategies aiming at locating more efficient solutions. 
We note that these strategies are not specific to the gen- 
eralized directed loop scheme presented in the previous 
sections, but can also be used to optimize the standard 
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FIG. 15: (Color online) The different paths a worm can take 
across a vertex: "Bounce", "Jump", "Straight" or "Turn". 



TABLE I: Autocorrelation times for the uniform magnetiza- 
tion (tm), the staggered magnetization (tm„), and the energy 
(te) for the generalized directed loop algorithm applied to a 
L = 64 sites spin-3/2 XY chain in a magnetic field h — 0.6 
at f3 = 64/ J, obtained using algorithms where supplementary 
strategies have been used after minimization of bounces, as 
explained in the text, for e = 3/4 J. 



Supplementary Strategy 


tm 


T M 3 


te 


Maximize Jump 


2.9 


20.4 


6.4 


Minimize Jump 


22.9 


12.5 


16.9 


Maximize Straight 


2.9 


6.4 


9.4 


Minimize Straight 


12.4 


22.5 


13.2 


Maximize Turn 


45.7 


22.4 


25.2 


Minimize Turn 


2.7 


23.6 


6.6 



of bounces. Doing so, we obtain six sets of scattering 
matrices, each corresponding to one of the above opti- 
mization goals. 



directed loop approach [I^ITsTl. 



A. Supplementary strategies 

Apart from the "bounce" path, where the worm back- 
tracks, there are three other paths that a worm can take 
across a vertex. We denote these other paths as "jump" , 
"straight" and "turn" j^l- See Fig. QSJ for an illustration 
of these definitions. 

Once the bounces have been minimized or even elim- 
inated, one might consider the effects of the remaining 
three paths of the worm-scattering process on the auto- 
correlation times. A practical means of doing so is as fol- 
lows: First, we use linear programming to minimize the 
bounces (Eq. I29[) and to obtain for each vertex configu- 
ration the lowest value of the bounce (denoted b, where 
b can take a different value for each possible vertex). In 
a second step, we then impose the condition Pu = b 
as a new constraint, in addition to Eqs. i|26l I27l2%jl . so 
that any feasible solution will be in the optimal subspace 
with respect to bounce minimization. 

We then consider new optimization goals, each chosen 
from the six following possibilities: we could minimize or 
maximize the jump, straight or turn probabilities. The 
jump probabilities simply correspond to the scattering 
matrix elements P14, P41, P23, P32, the straight proba- 
bilities to P13, P31, P24, P42 and the turn probabilities 
to P12, P21, P34, Pi3. For each of the six different strate- 
gies, we use linear programming with the additional con- 
straint to minimize or maximize the sum of these ma- 
trix elements for each vertex configuration. Then we use 
the resulting scattering matrices in the SSE algorithm. 
Note, that due to the additional constraint, we explicitly 
ensure that these algorithms will have a minimal number 



B. Results 

As an example of testing the efficiency of these strate- 
gies, we consider the £ = 3/2 XY chain in the parameter 
regime, where we found the generic solution of the gen- 
eralized directed loop equations in Sec. I VI Al to perform 
worse than the heat bath solution. In particular, we con- 
sider a chain with L = 64, (3J = 64, e = SJ/2, and a 
value of the magnetic field h = 0.6 J. 

In Tab. [I] we present results for the autocorrelation 
times of the magnetization (tm), staggered magnetiza- 
tion (tm s ) and energy (te) from using each of the six 
different strategies. 

The subspace of bounce-free solutions contains algo- 
rithms with autocorrelation times varying by about an 
order of magnitude; this indicates that a solution taken 
from this subspace without further guidance in general 
will not be the optimal one. 

From Tab. Q] we furthermore find, that the optimal ad- 
ditional strategy depends on the observable of interest. 
For example, in order to minimize the autocorrelation 
times of the energy, maximizing jumps is more efficient 
than maximizing the straight path, whereas for the stag- 
gered magnetization the two strategies perform opposite. 
This indicates, that in general it will not be possible to 
obtain a unique optimal strategy beyond the minimiza- 
tion of bounces. 

Minimizing bounces appears reasonable from an algo- 
rithmic point of view, in order to prevent undoing pre- 
vious changes to a QMC configuration. However, auto- 
correlations are also related to the physical phases of the 
model under consideration, and thus less well captured 
by a generic local prescription for the worm propaga- 
tion. In practice, the most efficient way to proceed for a 
given model will be to perform simulations for each dif- 
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FIG. 16: (Color online) Autocorrelation times for the uniform 
magnetization, measured using heat bath, standard directed 
loops, generalized directed loops, and the optimal solution 
(see text) for a L — 64 sites spin-3/2 XY chain in a magnetic 
field h at = 64/ J, for e = 3/4J. 

ferent strategy on small systems, in order to determine 
the optimal strategy for the observable of interest before 
performing production runs on larger systems. 

In order to illustrate the reduction of autocorrelation 
times that can be achieved using this scheme, we finally 
consider the spin-3/2 XY chain throughout the whole re- 
gion of magnetic fields, h = OAJ — 1.3 J, where we found 
the unexpected increase in the autocorrelation times (see 
Fig. llO|) . The resulting minimal autocorrelation times for 
the magnetization are shown in Fig. 1161 along with the re- 
sults for the autocorrelation times using heat-bath, stan- 
dard and generalized directed loops (without additional 
constraints). Our results clearly demonstrate, that the 
optimal algorithm gives rise to much better performance, 
in particular curing the autocorrelation time anomaly 
found in the previous section. We find that the opti- 
mal strategy depends on the magnetic field strength: for 
example, we find the best strategy to be (i) maximizing 
jumps for fields strengths h = OAJ, 0.5 J, 0.7 J, 1.1 J, and 
1.3J, (ii) minimizing turns for h = 0.6J, 1.0J, and 1.2J, 
and (iii) maximizing straight moves for h = 0.8J and 
0.9J. 



VIII. CONCLUSION 

In this paper we presented a generalized approach to 
the construction of directed loops in quantum Monte 
Carlo simulations. Viewing the worms ends not as arti- 
ficial discontinuities, but as physical operators with cor- 



responding weights we arrived at generalizations of the 
directed loop equations. Using linear programming tech- 
niques to solve these equations we can avoid the ana- 
lytical calculations needed in previous approaches, and 
arrive at a generic QMC algorithm. 

The generalized directed loop equations allow bounce- 
free solutions in larger regions of parameter space, but 
measurements of autocorrelation times for several models 
showed that minimizing bounces is not always sufficient 
to obtain an efficient algorithm. 

We therefore proposed a different means of further 
optimizing directed loop algorithms inside the subspace 
of bounce-minimal solutions. Additional strategies were 
presented, the use of which improves the performance up 
to an order of magnitude. However, the optimal strategy 
in general depends on both the model and the observable 
of interest. One therefore needs to perform preliminary 
simulations to find out which supplementary strategy is 
optimal for a given problem before turning to long calcu- 
lations, in order to account for the physical phase realized 
in the specific parameter regime. 

A recent paper j2^| discussed issues similar to the ones 
addressed here: can one obtain strategies that improve 
the efficiency of QMC algorithms beyond the directed 
loop scheme ? In our understanding of their work, the 
authors of Ref. pfjj propose to always keep a non-zero 
bounce probability to vertices with the largest weight. 
They then provide a precise form of the scattering matri- 
ces. In Ref. 0, Syljuasen also proposed to keep a non- 
zero bounce probability for the vertex with the largest 
weight in situations where he did not find bounce-free 
solutions. The main difference between the approach of 
Pollet et al., and Syljuasen thus concerns the off-diagonal 
elements of the scattering matrix. As shown explicitly in 
Sec. I VIII the off-diagonal matrix elements strongly af- 
fect the efficiency of the algorithm in a parameter- and 
observable-dependent way. This indicates that there will 
be no simple rule for the construction of the scattering 
matrices, which perform optimal in all cases. Similar 
conclusions were reached in Ref. [2^. A full SSE code 
featuring the implementation of the generalized directed 
loop technique described in t he p resent paper is available 
as part of the ALPS project |29j . 
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